<!DOCTYPE html>
            
<HTML>
<HEAD>
<meta name="booktitle" content="Developing Applications With Objective Caml" >
 <meta charset="ISO-8859-1"><meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0">
<META name="GENERATOR" content="hevea 1.05-7 of 2000-02-24">
<META NAME="Author" CONTENT="Christian.Queinnec@lip6.fr">
<LINK rel=stylesheet type="text/css" href="videoc-ocda.css">
<script language="JavaScript" src="videoc.js"><!--
//--></script>
<TITLE>
 Profiling
</TITLE>
</HEAD>
<BODY class="regularBody">
<A HREF="book-ora097.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
<A HREF="index.html"><IMG SRC ="contents_motif.gif" ALT="Contents"></A>
<A HREF="book-ora099.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>
<HR>

<H2> Profiling</H2>
<A NAME="@concepts215"></A>This tool allows measuring a variety of metrics concerning program
execution, including how many times a particular function or control
structure (including conditionals, pattern matchers and loops) are
executed. The results are recorded in a file. By examining this
information, you may be able to locate either algorithmic errors or
crucial locations for optimization.<BR>
<BR>
In order for the profiler to do its work, it is necessary to compile
the code using a special mode that adds profiling instructions.
There are two profiling modes: one for the bytecode compiler,
and the other for the native-code compiler. There are also two
commands used to analyze the results. Analysis of native code will
retrieve the time spent in each function.<BR>
<BR>
Profiling an application therefore proceeds in three stages:
<OL type=1>
<LI>
 compilation in profiling mode;

<LI> program execution;

<LI> presentation of measurements.
</OL><A NAME="toc126"></A>
<H3> Compilation Commands</H3>
<A NAME="@concepts216"></A>
The commands to compile in profiling mode are the following: 
<UL>
<LI>
 <TT>ocamlcp -p </TT> <EM>options</EM> 
for the bytecode compiler;

<LI> <TT>ocamlopt -p </TT> <EM>options</EM> for the native-code compiler. 
</UL>These compilers produce the same type of files as the usual
commands (<A HREF="index.html#chap-Compilation">7</A>). The different options are
described in figure <A HREF="book-ora098.html#fig-prof-opt">10.1</A>. 
<BLOCKQUOTE><DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV>
<DIV ALIGN=center>
<TABLE BORDER=1 CELLSPACING=0 CELLPADDING=1>
<TR><TD  ALIGN=left NOWRAP><TT>f</TT></TD>
<TD  ALIGN=left NOWRAP>function call</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>i</TT></TD>
<TD  ALIGN=left NOWRAP>branch of <B>if</B></TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>l</TT></TD>
<TD  ALIGN=left NOWRAP><B>while</B> and <B>for</B> loops</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>m</TT></TD>
<TD  ALIGN=left NOWRAP>branches of <B>match</B></TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>t</TT></TD>
<TD  ALIGN=left NOWRAP>branches of <B>try</B></TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>a</TT></TD>
<TD  ALIGN=left NOWRAP>all options</TD>
</TR></TABLE>
</DIV>
<BR>
<DIV ALIGN=center>Figure 10.1: Options of the profiling commands</DIV><BR>

<A NAME="fig-prof-opt"></A>
<DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV></BLOCKQUOTE>These indicate which control structures must be taken into account. By
default, the <TT>fm</TT> options are activated.<BR>
<BR>
<A NAME="toc127"></A>
<H3> Program Execution</H3>
<H4> Bytecode Compiler</H4>
<A NAME="@concepts217"></A>
The execution of a program compiled in profiling mode will, if it terminates, produce a
file named <TT>ocamlprof.dump</TT> which contains the information wanted.<BR>
<BR>
We resume the example of the product of a list of integers. We write
the following file <TT>f1.ml</TT>: 


<PRE><BR><B>let</B><CODE> </CODE><B>rec</B><CODE> </CODE>interval<CODE> </CODE>a<CODE> </CODE>b<CODE> </CODE><CODE>=</CODE><CODE> </CODE><BR><CODE> </CODE><CODE> </CODE><B>if</B><CODE> </CODE>b<CODE> </CODE><CODE>&lt;</CODE><CODE> </CODE>a<CODE> </CODE><B>then</B><CODE> </CODE>[]<BR><CODE> </CODE><CODE> </CODE><B>else</B><CODE> </CODE>a::<TT>(</TT>interval<CODE> </CODE><TT>(</TT>a<CODE>+</CODE><CODE>1</CODE><TT>)</TT><CODE> </CODE>b<TT>)</TT>;;<BR><BR><B>exception</B><CODE> </CODE>Found_zero<CODE> </CODE>;;<CODE> </CODE><BR><BR><B>let</B><CODE> </CODE>mult_list<CODE> </CODE>l<CODE> </CODE><CODE>=</CODE><CODE> </CODE><BR><CODE> </CODE><B>let</B><CODE> </CODE><B>rec</B><CODE> </CODE>mult_rec<CODE> </CODE>l<CODE> </CODE><CODE>=</CODE><CODE> </CODE><B>match</B><CODE> </CODE>l<CODE> </CODE><B>with</B><CODE> </CODE><BR><CODE> </CODE><CODE> </CODE><CODE> </CODE><CODE> </CODE>[]<CODE> </CODE>-&gt;<CODE> </CODE><CODE>1</CODE><BR><CODE> </CODE><CODE> </CODE><CODE>|</CODE><CODE> </CODE><CODE>0</CODE><CODE>::_</CODE><CODE> </CODE>-&gt;<CODE> </CODE>raise<CODE> </CODE>Found_zero<BR><CODE> </CODE><CODE> </CODE><CODE>|</CODE><CODE> </CODE>n::x<CODE> </CODE>-&gt;<CODE> </CODE>n<CODE> </CODE><CODE>*</CODE><CODE> </CODE><TT>(</TT>mult_rec<CODE> </CODE>x<TT>)</TT><BR><CODE> </CODE><B>in</B><BR><CODE> </CODE><CODE> </CODE><B>try</B><CODE> </CODE>mult_rec<CODE> </CODE>l<CODE> </CODE><B>with</B><CODE> </CODE>Found_zero<CODE> </CODE>-&gt;<CODE> </CODE><CODE>0</CODE><BR>;;<CODE> </CODE><BR>

</PRE>
<BR>
<BR>
and the file <TT>f2.ml</TT> which uses the functions of <TT>f1.ml</TT>:


<PRE><BR><B>let</B><CODE> </CODE>l1<CODE> </CODE><CODE>=</CODE><CODE> </CODE>F1.interval<CODE> </CODE><CODE>1</CODE><CODE> </CODE><CODE>3</CODE><CODE>0</CODE>;;<BR><B>let</B><CODE> </CODE>l2<CODE> </CODE><CODE>=</CODE><CODE> </CODE>F1.interval<CODE> </CODE><CODE>3</CODE><CODE>1</CODE><CODE> </CODE><CODE>6</CODE><CODE>0</CODE>;;<BR><B>let</B><CODE> </CODE>l3<CODE> </CODE><CODE>=</CODE><CODE> </CODE>l1<CODE> </CODE><CODE>@</CODE><CODE> </CODE><TT>(</TT><CODE>0</CODE>::l2<TT>)</TT>;;<BR><BR>print_int<CODE> </CODE><TT>(</TT>F1.mult_list<CODE> </CODE>l1<TT>)</TT>;;<BR>print_newline();;<BR><BR>print_int<CODE> </CODE><TT>(</TT>F1.mult_list<CODE> </CODE>l3<TT>)</TT>;;<BR>print_newline();;<BR>

</PRE>
<BR>
<BR>
The compilation of these files in profiling mode is shown in the following:
<PRE>
ocamlcp -i -p a -c f1.ml
val profile_f1_ : int array
val interval : int -&gt; int -&gt; int list
exception Found_zero
val mult_list : int list -&gt; int
</PRE>With the <CODE>-p</CODE> option, the compiler adds a new function 
(<TT>profile_f1_</TT>) for the initialization of the counters in module
<TT>F1</TT>. It is the same for file <TT>f2.ml</TT>:
<PRE>
ocamlcp -i -p a -o f2.exe f1.cmo f2.ml
val profile_f2_ : int array
val l1 : int list
val l2 : int list
val l3 : int list
</PRE>
<H4> Native Compiler</H4>
<A NAME="@concepts218"></A>
The native code compilation gives the following result:
<PRE>
$ ocamlopt -i -p  -c f1.ml
val interval : int -&gt; int -&gt; int list
exception Found_zero
val mult_list : int list -&gt; int
$ ocamlopt -i -p -o f2nat.exe f1.cmx f2.ml
</PRE>Only the <TT>-p</TT> option without argument is used. The execution of
<TT>f2nat.exe</TT> produces a file named <TT>gmon.out</TT> which is in a
format that can be handled by the usual Unix commands (see page <A HREF="book-ora098.html#gprof">??</A>).<BR>
<BR>
<A NAME="toc128"></A>
<H3> Presentation of the Results</H3>
Since the information gathered by the two profiling modes
differs, their presentation follows suit. In the first (bytecode) mode
comments on the number of passages through the control structures are
added to the program text. In the second (native) mode, 
the time spent in its body and the number of calls is associated with
each function.<BR>
<BR>

<H4> Bytecode Compiler</H4>
<A NAME="@concepts219"></A>
The <TT>ocamlprof</TT> command gives the analysis of the measurement
results. It uses the information contained in the file <TT>camlprof.dump</TT>. This command takes the source of the program on
entry, then reads the measurements file and produces a new program
text with the desired counts added as comments.<BR>
<BR>
For our example this gives:
<PRE>
ocamlprof f1.ml

let rec interval a b = 
  (* 62 *) if b &lt; a then (* 2 *) []
  else (* 60 *) a::(interval (a+1) b);;

exception Found_zero ;; 

let mult_list l = 
 (* 2 *) let rec mult_rec l = (* 62 *) match l with 
    [] -&gt; (* 1 *) 1
  | 0::_ -&gt; (* 1 *) raise Found_zero
  | n::x -&gt; (* 60 *) n * (mult_rec x)
 in
  try mult_rec l with Found_zero -&gt; (* 1 *) 0
;; 
</PRE>
These counters reflect the calculations done in <TT>F2</TT> quite
well. There are two calls of <TT>mult_list</TT> and 62 of the
auxiliary function <TT>mult_rec</TT>. Examination of the different
branches of the pattern matching show 60 passages through the common case,
one through the pattern [] and the only match where the head is 0,
raising an exception, which can be seen in the counter of the
<B>try</B> statement.<BR>
<BR>
The <TT>ocamlprof</TT> command accepts two options. The first <I>-f
 file</I> indicates the name of the file to contain the
measurements. The second <TT>-F</TT> <EM> string</EM> specifies a string
to add to the comments associated with the control structures treated.<BR>
<BR>

<H4> Native Compilation</H4>
<A NAME="@concepts220"></A>
To get the time spent in the calls of the functions for multiplying
the elements of a list, we write the following file <TT>f3.ml</TT>:


<PRE><BR><B>let</B><CODE> </CODE>l1<CODE> </CODE><CODE>=</CODE><CODE> </CODE>F1.interval<CODE> </CODE><CODE>1</CODE><CODE> </CODE><CODE>3</CODE><CODE>0</CODE>;;<BR><B>let</B><CODE> </CODE>l2<CODE> </CODE><CODE>=</CODE><CODE> </CODE>F1.interval<CODE> </CODE><CODE>3</CODE><CODE>1</CODE><CODE> </CODE><CODE>6</CODE><CODE>0</CODE>;;<BR><B>let</B><CODE> </CODE>l3<CODE> </CODE><CODE>=</CODE><CODE> </CODE>l1<CODE> </CODE><CODE>@</CODE><CODE> </CODE><TT>(</TT><CODE>0</CODE>::l2<TT>)</TT>;;<BR><BR><B>for</B><CODE> </CODE>i<CODE>=</CODE><CODE>0</CODE><CODE> </CODE><B>to</B><CODE> </CODE><CODE>1</CODE><CODE>0</CODE><CODE>0</CODE><CODE>0</CODE><CODE>0</CODE><CODE>0</CODE><CODE> </CODE><B>do</B><CODE> </CODE><BR><CODE> </CODE><CODE> </CODE>F1.mult_list<CODE> </CODE>l1;<BR><CODE> </CODE><CODE> </CODE>F1.mult_list<CODE> </CODE>l3<BR><B>done</B>;;<BR><BR>print_int<CODE> </CODE><TT>(</TT>F1.mult_list<CODE> </CODE>l1<TT>)</TT>;;<BR>print_newline();;<BR><BR>print_int<CODE> </CODE><TT>(</TT>F1.mult_list<CODE> </CODE>l3<TT>)</TT>;;<BR>print_newline();;<BR>

</PRE>

This is the same file as <TT>f2.ml</TT> with a loop of 100000 iterations. <BR>
<BR>
Execution of the program creates the file <TT>gmon.out</TT>. This is in a
format readable by <TT>gprof</TT>, a command that can be found on
Unix systems. The following call to <TT>gprof</TT><A NAME="gprof"></A> prints
information about the time spent and the call graph. Since the output
is rather long, we show only the first page which contains the name of
the functions that are called at least once and the time spent in each.
<A NAME="gprof"></A>

<PRE>
$ gprof  f3nat.exe 
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 92.31      0.36     0.36   200004     1.80     1.80  F1_mult_rec_45
  7.69      0.39     0.03   200004     0.15     1.95  F1_mult_list_43
  0.00      0.39     0.00     2690     0.00     0.00  oldify
  0.00      0.39     0.00      302     0.00     0.00  darken
  0.00      0.39     0.00      188     0.00     0.00  gc_message
  0.00      0.39     0.00      174     0.00     0.00  aligned_malloc
  0.00      0.39     0.00      173     0.00     0.00  alloc_shr
  0.00      0.39     0.00      173     0.00     0.00  fl_allocate
  0.00      0.39     0.00       34     0.00     0.00  caml_alloc3
  0.00      0.39     0.00       30     0.00     0.00  caml_call_gc
  0.00      0.39     0.00       30     0.00     0.00  garbage_collection
...
</PRE>
The main lesson is that almost all of the execution time is spent in
the function <TT>F1_mult_rec_45</TT>, which corresponds to the
function <TT>F1.mult_rec</TT> in file <TT>f1.ml</TT>. On the other hand
we recognize a lot of other functions that are called. The first on
the list are memory management functions in the runtime library
(<A HREF="index.html#chap-GC">9</A>).<BR>
<BR>
<BR>
<BR>

<BR>
<BR>
<HR>
<A HREF="book-ora097.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
<A HREF="index.html"><IMG SRC ="contents_motif.gif" ALT="Contents"></A>
<A HREF="book-ora099.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>
</BODY>
</HTML>
